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tion from constructing tests with particular types of items deleted^ and 
(4) exajn construction or processing procedures which would raise test 
quality for both Blacks and Whites. 

^ Item differentiation levels, calculated, as the difference in item- 
difficulty between high and low scorers (D /value) and also aa the item-- , . 

total correlation (r, ), were found to be lower for Blacks than for Whites, 

*^t " / . . 

partly because item-difficulty levels were' lower for Blocks. The highest 
item-differentiation values had cotresponding item-difficulty levels which 
were easier than the median difficulty leyels, indicating that the use of 
easier items should contribute to betj^er 'item differentiation for botli 
Blacks and Whites. Black-V^lte^-score differences were reduced by construc- 
tion of new tests using items of similar/ difficulty, but test quality was 
also reduced. Both item differentiatior^ and test reliability were improved'? 
by the construction of tests using easier items or more highly correlated ,^ 
items, with slight and varied changes in score differences. The "best*' 
items initially selected by a sequentia^l procedure, applying an internal 
criterion, were not the same as those ^elected by an external criteriorf. 

An empirical validation of the present tests on subsequent job performance 
for both Blacks and Whites was recommended, as was a validation and comparison 
on internal and external criteria of the alternative test construction pro- 
cedures identified. ^ ^ / 
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FOREWORD- 



This study was initiated in response to a request from the Chief of 
Naval Personnel (Pers-6) to determine the feasibility of developing 
linlistecl Advancement Exams from items similar in difficulty for both 
Black and White racial groups, as an approach to improving equal oppor-^ 
tunity in career growth for minority groups. Previous studies examined 
item-difficulty levels both for entire racial groups (Robertson & Royle,^ 
1976 — TR 76-6) a,nd for subgroups matched on total test score (Robertson & 
Montague, 1976 — TR 76-3A) . This report, the third in a series, ekaraines 
item differentiation and test * reliability , for the present exams and for 
modified exams using alternative item selection procedures. 

The substantial and valuable assistance of the following persons is 
gratefully acknowledged: Mr, William E, Montague and DP2 Suzanne Olson, 
for: data processing and computation; and Ms. Hazel F. Schwab, for clerical 
support. 

This study was performed under Exploratory * Develop. ueiit Task Area 
ZF55. 521.031 (Career Performanc'e and Selection) . ' 



J. J. CLARKIN 
Commanding Officer 
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SUMMARY 



Problem ^ ^ 

.>-' ^ . ' 

Blacks are advanced to paygrades E-4^nd above in smaller proportions 
than Whites and score lower on the technical knowledge exam than do Whites. 
It has been found that when exams were constructed only of items similar 
,in ^difficulty for both Blacks and Whites (to reduce total test score 
differences) , 'the items were concentrated in the difficult (i.e., guessing) 
range. This prior finding suggested that such an apprbach would degrade ^ 
test ^plity. 

Purpose ^ ' , 

As a follo,w--on, the present study investigated test quality in terms 
of item differentiation and test reliability. Questions specif ically 
addressed were: (1) ,what racial differences' in item differentiation exist, 
(2) what levels of item difficulty (P value) yield maximum item differentia- 
tion for Black and Whites, (3) what impact constructing tests by selecting 
particular types of items would have on item differentiation, and (4) 
what exam' construction or processing techniques would raise test quality 
for Blacks and Whites. ' . 

Approach 

Item response data for exams qf six occupational specialties across 
four pay grades (i.e., 24 different exams) were analyzed as follows: 

1. Racial differences in item differentiation were calculated as (a) 
the difference in item difficulty between high and low scorers (D value) 
and (b) the item-total score correlation (ji^^ value). 

2. Levels of item difficulty yielding maximum item differentiation were 
determined by comparing values with corresponding and r^^^ values. 

3. Three types of modified tests were developed by selecting different 
types of items: (a) items similar in difficulty for Blackt and Whites (SIM-P), 
(b) those that were not extremely difficult (UPA-P) , arid (c) those^that were 
highly correlated (SEQUIN). Black-White score differences in^item differ- 
entiation and test reliability values for these tests were compared with 
those for the original test. 

4. The SEQUIN item-selection procedure was applied to certain exams 
using an on-job performance factor as a criterion^ - ttems correlating high 
with internal (total score) and external (on-job performance) criteria were 
compared. 

Findings 

1. Item dif f orontiation was generally lower for Blacks than for Whites, 
partly because item-difficulty (JP value) distributions are lower for Blacks 
than Whites (p. 7) . 

r > 
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2. ) The higliest item-dif f erentiatioa values Q) and r^^^ valu'es) had 

:orrespoViding item— dif f iculty levels (P values) that were higher than the 
median P values (of all items) . This indicates that the use of easier ) i teips 
3hould contributG to higher (i.e», better) Item differentiation for both 
ilacks and Whites (pp. 7- and 11). 

3. ' Selecting items that were similar in difficulty for both .Blacks 
ind Whites (SIMP~P test) did reduce mean score differences between 'Blacks- 
ind Whites but it also reduced item differentiation and test reliability, 
^electing , items that were easier for Blacks (UPA-^P test>) aTid those that 
(/ere highly correlated (SEQUIN test) resulted in slight and varied changes* 
in mean score, differences and also increased^ item dif f eifentiation and 

test reliability (p, 11). 

; • ^ 

4. The " be sd'\ items initially selected by the SEQUIN procedure by ^ 
applying an internal criterion were not the same as thos-e .selected by 
applying an external criterion. This result raises new questions regarding 
the relevance of internal-consistency type measure^ of t^t quality to 
n(iasures of subsequent' job-relevant jDerformance (p. 14). 

, - » ■ 

[Conclusions 



r. Item differentiation and test reliability of advancement exams 
^ould be impr6ved for both Blacks and Whites by using .item selection and 
cronstruction procedures identified in* t|is study. 

2. Developing tests by^using items similar in difficulty for Blacks 
and Whites is not feasible since it reduces test quality. However, developing 
tests by eliminating excessively difficult items would improve test quality 
and benefit Blacks. 

Re c o^ enda t i o n s , ' . 

The empirical validity of the present tests on subsequent job performance 
should l)0' compared betwcr^^ Blacks and Whites, . and the alternative item pro- 
cessing and constru^tioi >cedures identified herein should be validated 
and compAred on internal and external criteria. 
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INTRODUCTION 

f 

Prgblem and Background • « ^ 

The -Enlisted Advancement System Is one of the Navy' s itiajor personnel 
selection sysCems being studied to/ identify and alleviate any condition 
that might be detrimental to equal opportunity in career growth for all 
individuals and groups. Advancements to paygrades E-4 and above are 
competitive and are based on several differentially 'weighted factors, 
Including the score obtained on a technical knowledge exam, which is sub- 
stantially weighted. A separate exam, comprising 150 taul tiple-choice 
items, is developed for each of approximately 80 Navy ratings (I.e., occupa- 
tional specialties) and for each i^aygrade within each rating. , 

It ha^ been found that Blacks score lower than Whites on the technical 
knowledge exams, ari^ that* a smaller proportion of Blacks than Whites are 
advanced. To reduce the difference in s.cores, Rpber tson \and Royle (1975) 
inv^^igated the feasibility of constructing exams containing only items 
that were similar in difficulty for ^oth Blacks and'Whites. They concluded 
that the construction of such tests could not be reccftmnended , since the 
items of similar difficulty were concentrated in the difficult range (i.e., 
in the guessing range). Although they found that differences in average 
total test score between Blacks and Whites would be reduced in tests con- 
structed of this type of item, they suggested that/ such tests wouljj degrade 
test quality^for both groups. ,Thus, one aspect of the problem is to find 
ways of constructing advancement tests that provide similar 'competitive 
opportunity for all groups, but without loss of test quality, as measured 
by Itqm differentiation or internaJL consistency-type reliability. 

i ' > 
Purpose ' , ' 

This study investigated racial differences in test quality in terms 
of Item differentiation,^ including the effects from alternative item 
selection techniques. , 

The quest^fons specifically addressed were: 

1. What differences in item differentiation exist between fii^cks 
and Wliites? ^ 

2. What value levelB^yield maximjiim item differentiation for Blacks 
and Whites? ' 

5 ^ " 

3. What impact would con^itjc^uc t Ing tests by selecting particular types 
.of Items have on item differentiation and test reliability? 

4^ What exam coustructiori or processing prpcetJures would raise test 
quality for Blacks and Whites. ^ ' . ' 



^Thc term 'Vitem differentiation" is used instead of the*' term typically 

used In Item-analytic studies, "item discrimination," to avoid contusion 
in the context of rac lal' d Iscr iminat ion ." . 

\ 
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METHOD 

V. 

Data * ' , 

Item response data* from tKe technical knowledge' exams of the Series 
61 (August 19.72) advancement competitions were provid'ed by the Naval 
Kxamining Center {xfpM the Nava^l Dducation and Training Program Develop- 
ment Center; NETPDC).^ The ratings selected for analysis were those in 
which minority group representation was relatively high. The six. ratings 
selected, in competition to paygrades A , through 7, were: , 

/" ' * « 

Aviation Machinist *s Mate (Jet-^gine Mechanic)- (ADJ) - 

Boatswain's Mate (BM) 

Boiler Technician (BT) 

Commis^aryman (CS) . - 

Hospital Corpsmati (HM) 

Machinist's Mate (MM) ' * ^ . 

Data (of Blades and White -only) for the 24 separate competing' groups were 
analyzed. Table 1 presents sample size, total test mean, ^nd standard 

deviation for each^ group, . ^ * 

Analysis 

Racial Differences in Item Differentiation 

Item differentiation is consider ed more important than item- 
difficulty in constructing test's from "good" items; that is, those that 
are neither extremely easy nor difficult (e.g., Rvalues between 40 and 80). 
and that relate to- the total test-score either by a high positive, correla-- 
lion or by higher proportions of 'high than low scorers answering the item 
correctly, ^values.of medium difficulty place upper limits on the rela-- 
tionship of an item ^to total test score, but do not guarantee effective 
item differentiation (Nunnally, 1967). The and D value, statistics 

were applied tO(Selected items of some of the exams to examine racial 

differences in itom differentiation* The r, statistics were obtained 

*^t 

by calculating a Pearson product-^moment correlation between each individual' 
right-wropg response to an item and total test score, yielding point 
biserlal coefficient. ^f1ie ])^ value statistic was calculated by rank-ordering 
total scores and splitting them at the median,- creating two subgroups — those 
who scored high on the total test score and those who scored low. values 
were obtained by subtracting the percentage of high scorers who answered, 
tlie Item correctly from the percentag.e of low scorers who answered the item 
correctly. Details of these procedures *and differences between them are 
discussed in the Appcni^ix. ' / ^ 



9 

"This data set was also usei in previous studies of this series 
(I.e., Robertson & Royle, 1975 and Robertson & Mmxtague, 1976). 



Table 1 " \ , 

Advancement Exam Sample Sizes, Means, 
And Standard Deviations by Race 



Competition to 






^Race 










Pay 


Rate 




Black 






White 






Grade 




N 


X 


SD 


N 


X 


SD 




ADJ3 • 


47 


52. 38 


12,.60 


644 


69.96 


14. 


75 




BM3 


83 


58.07 


9.38 


1033 


64.15 


11. 


86 


4 


BT3 


33 


61.76 


13.37 


831 


73.77 


16. 


68 




CS3 


27 


67.59 


10. 15 


447 


76.12 


11 . 


76 




HM3 


104 


68 .00 


11.17 ^ 


1429 


73.45 


15. 


53 




MM3 


58 


62.48 


12.26 


1259 


72.44 


16. 


56 



5 



ADJ2 


30 


58.27 


14.39 


565 


63. 


,55 


.15.01 


BM2 


74 


60.12 


11.70 


569 


63, 


,43 


10.56 


BT2 


28 


60.11 


10.25 


511 


73. 


,61 


16.57 


CS2 


47 


64.00 


11.41 


412 


69, 


.01 


10.66 


HM2 


111 


63.60 


9.43 


1391 


70, 


,27 


13.40 


MM2 


30 


56.37 


13.69 


984 


74, 


.09 


15.95 



6 



7 



ADJl 


SO 


67'. 78 . 


15. 


,56 


400 


72. 


31 


15. 


,19 


BMl 


115 


66. 


,33 


11. 


,18 


502 


72. 


31 


11. 


,49 


BTl 


79 


70. 


,44 


13. 


,57 


495 


80. 


70 


17. 


,18 


CSl 


127 


68. 


,27 


12. 


,22 


661 


72. 


,04 


11. 


,78 


HMi 


l26 


68. 


,58 


6, 


,87 


546 


71'. 


,32 


11. 


,08 


MMl 


%2 


62. 


.44 


11, 


,26 


774 


75 . 


,39 


14, 


,04 


ADJC 


88 


66. 


,77 


14, 


,23 


1014 


70, 


,07 


14, 


,50 


BMC 


I 


63. 


,60 


12, 


,42 


1103 


65, 


,75 


10, 


.87 


BTC 


158 


77. 


,91 


17, 


,61 


956 


80., 


,57 


15, 


.59 


CSC 


165 


63-, 


.01 


14, 


,24 


771 


65, 


,58 


13, 


.92 


HMC 


157 


71. 


,24 


13, 


, 73 


1817 


70, 


,75 


13, 


,02 


MMC 


110 


75, 


.35 


13, 


,81 


1547 


78. 


,73 


13, 


.63 
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Effects of Item-Difficulty (P Value) on Item Differentiation 



Although values of medium difficulty generally produce the most 
differentiating items^ the literature is not in full agreement as to what 
the Ideal value or range of P values should be. Thus, to investigate 
the relationship between item-difficulty and item differentiation, D 
values were rank ordered, seven-item sets were extracted from the top ^ 
ranks, and the corresponding JP values for the JD values were identified. .. 
Slmilarily, _P values were rank ordered; seven-item sets were extracted 
from the top, middle, and bottom of the ranks; and the corresponding 
\/alue s wer e id en t if*ied • Finally, r. values we re rank order ed « and uhe 

values for the highest and lowest nine r values were identified. All 

of the above statistics were computed separately for Blacks and Whites 
and then compared for racial differences. 

Kffects of Item Selection Procedures 

lo compare the impacts on test reliability and item differentiation 
from alternative methods of item selection, the following three types 
of tests were simulated and comparative statistics computed: 

1. The similar 1^ value (SIM-P) method, developed by Robertson 
and Royle (1975), Which selects only those items having a White P^ value 
that is not significantly greater than the Black P^ value. 

2. The upgraded value (UPA-P) method, developed by Robertson 
and Royle (1975)^ which selects only those items haying a Black P^ value 
greater than 25. 

3. The SKQIILN method, developed by Moonan, Balaban, and Geyser 
(L967), which sequentially identifies and selects it nis with high roiwOa- 

t ions to 'm^ix Imize a least squares prediction of a (i -rion of total score. 
This "heuristic." method sc^^ects items in an "accretiou' procedure. The 
first Item selectetl is the one that correlates most hit;iily with a specified 
crittM^lon; subsequent I terns selectetl are those whose i n tercorr ela t ions 
with tlie items already nominated tend to maximize ^Tie correlation coefficient 
In a regrtvssion (Hjuatit^n. 

internal consistency reliabilities (Kud er-I^ichardson type, (^hiselli,' 
, I'ormula 9-19) were recalculated for tlie new shortened tests, and 
comparetl with tiiose of the original (OKiCO 150- item test. The obtained 
^ valutas for the sliortencKl tests wert^ corrected l)y the Spearman-Brown T'ormula 
(Oiiiselli, 1 9()6 , I'ornui 1 a 9+-A ) to provi.tU^ t^ompar i sons of I50~item tests. 

Mt\ins and .staiulard tlt^viations wcmh^ r ecalcu la tetl separately for 
iilack.s and Wliitt^s for the diortened tests and compareti witii tiiost* of tiie 
or i g i na 1 i es t , 
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Effects of Exam Construction and Processing Procedures 



To examine alternative test construction or processing" procedures \^ 
that might raise test quality, a concurrent' measure of on-job performance 
was used. Since no longitudinal type of external criterion was available 
for the present analysis, such as a measure of technical job performance 
at the next higher paygrade, the Performance Factor in the composite, 
for advancement competition was utilized for illustrative purposes. 
(Since this factor is a measure of present rather than subsequent job 
performance, and includes evaluation of interpersonal behaviors, such 
as leadership and conduct, in addition to technical effectiveness, its 
use for illustrative purposes only is emphasized.) 

The SEQUIN item-selection procedure was applied to the ADJ3 and BM2 
Exams with the Performance Factor as a criterion. Items selected early 
and late in the sequential procedure by two types of criteria — internal 
(total score) and external (on-job performance) — were then compared to 
determine characteristics of valid items in predicting job performance. 
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RESULTS 

Racial Differences in Item Differentiation 



Black values were found to be lower than White D values in 18 of the 
24 rate groups (see median difference column of Table 2). A rank order 
correlation between the median difference and Black sample size of -.42 
indicates that the differences are, partly attributable to the small Black 
sample si^es (i.e., the largest differences tend to be associated with the 
smallest Black samples). 

Table 3 illustrates the racial differences in item differentiation 

in terms of both D value and r^ ' differences for 20 items in the ADJ3 
— — It ^ 

Exam. As shown, Biack ^ values were more than 10 percentage points lower 
than White values on 8 items, while White D values were lower on 4 items. 
(An inspection of all Blapk-White D value differences revealed that, in 
16 e5<^ams. Whites were the higher in a majority of those items with differ- 
ences of at least 10 percentage points; in 2 exams Blacks were the higher; 
and in the remaining 6 exams, the frequency with Blapjcs higher and Whites 
higher was about equal.) On the ADJ3 Exam, employing the r. to trans- 
formation (Hays, 1963, Formula 15.26.6), Black and White r^^ values were? 

significantly different for only 12 out of 150 items, which is only 4 items 
more than would be . expected by chance. Of these 12 items. Blacks were 
lower on 8. 

One possible reason for the lower Black item differentiation might be 
the finding in the Robertson and Royle (1975) study that larger propor- 
tions of Black than White values are concentrated in or near the guessing 
range (where item differentiation is poorest). The IP values for Item 30 
in Table 3 tend to support this hypothesis, since the Black _P value is 
in the guessing range, but the values for Item 16 do not. 

Effects of Item-Difficulty (P Value) on Item Differentiation 

Since P. values of medium difficulty should yield the highest D^ values, 
it is of interest to compare the corresponding values of the highest 
]) values with the median P_ value of the total test (see Table 4). As 
shown, the corresponding median P^ value of the highest D values is higher 
than the total test median P^ value in 18 of the 24 rate groups for both 
Blacks and Whites. (The six exceptions are: Black — CS3, BM2, ADJl, Miily 
BTC, and HMC; and White~MM3, BT2, BTl, HMl, BTC, and liMC.) For example, the 
corresponding median P^ value, 42.55, for the highest Di values of the ADJ3 
Biack Group is substantially greater than the total test median P_ value, 
34.0, for that group. 

Similar results weit obtained from examining the corresponding P^ 
values for high and low values, and from reversing the orientation 

and comparing high and low P. values and their corresponding D values. 
These results are presented in greater detail in the Appendix. 
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Table 2 
flange and Median Values 



Blacks Whites 

— ~ Median Rank of 

Rate N Range Median N Range Median Diff. Diff. 



AD J 3 


47 


-8 


1 8-64 


73 


^ ^ . 


83 


644 


5. 


46-54.47 


24, 


.82 


-1. 


,99 


1 Q 

1. tJ 


BM3 


83 








1 7 






3. 


38-44.55 


21. 


,48 


-3. 


,62 


^ 1. 












^ 1. a 


1 1 

. 1. 1. 


8 31 


3. 


84-50.83 


25, 


.35 


-4. 


,24 


')'\ 


CS3 


27 


-26. 


. 14-72, 


.73 


19, 


.50 


447 


-0. 


08-50.05 


21, 


.50 


-2, 


.00 


16 


HM3 


104 


-13. 


,47-48, 


.90 


19, 


.51 


1429 


3. 


26-44.73 


23, 


.46 


-3. 


,95 


22 




DO 


1 

- i . 




. O / 




. y / 


1 o r o 

1259 


0. 


35-50.87 


24, 


.58 


-3. 


,61 


20 


An T7 


ok) 


0/1 




. UU 




. DZ 


bob 


5. 


89-44.28 


24, 


.09 


-1, 


,47 


13 


BM2 


1 4 


- i i . 


r\ 1 AO 

, Ul -4o , 


. 96 


2 1 , 


. 64 


569 


1. 


07-39.65 


21 


.11 


0. 


,53 


5 


D IZ 


28 


-23. 


, 59-72 . 


. 31 


21 , 


. 54 


Sll 


-6. 


25-49.55 


24, 


.67 


-3, 


.13 


18 


CS2 


47 


-23. 


.09-63. 


.45 


20a 


.05 


412 


-3. 


58-37.34 


19, 


.05 


1, 


.00 , 


3 


HM? 


1 1 1 

■L X i. 




i O -7 — H / c 






Ql 




-0.03-44.80 


22.02 


-5.11 


9/1 
Z't 


MM2 


30 


-19, 


.64-60, 


.00 


21\, 


.72 


984 


3. 


03-50.03 


24, 


.86 


-3, 


.14 


19 


ADJl 


50 


-17. 


,90-59, 


.03 


25, 


.76 


400 


2. 


14-52.23 


26, 


.06 


-0. 


,30 


7 


BMl 


115 


-4. 


,48-45, 


.61 


22, 


. 12 


502 


2. 


36-36.97 


21, 


.14 


0, 


.98 


4 


BTl 


79 


-3. 


,23-48, 


.90 


23. 


,45 


495 


0. 


27-48.84 


25, 


.33 


-1.88 


14 


CSl 


127 


-6. 


,58-48, 


.22 


20. 


,86 


661' 


2. 


32-40.53 


21 , 


.99 


-1. 


,13 


9 


HMl 


26 


-28. 


,57-65, 


.00 


17, 


.50 


546 


1. 


67-44.82 


18, 


.84 


-1, 


.34 


12 


MMl 


62 


-10. 


,71-51, 


.04 


22. 


,32 


774 


-0. 


75-45.44 


24, 


.33 


-2, 


.01 


17 


ADJC 


88 


-15. 


,80-56, 


.15 


24. 


,42 


1014 


0. 


79-54.24 


25, 


.61 


-1, 


.19 


10 


BMC 


193 


-2. 


, 13-4-4, 


.97 


22. 


,54 


1103 


-1. 


21-39.94 


20, 


.57 


1. 


,97 


1 


BTC 


138 


2. 


, 18-53. 


,61 


23. 


,43 


956 


1. 


90-42. 13 


24, 


.63 


-1, 


,20 


11 


CSC 


165 


1. 


33-56, 


.48 


22. 


,81 


771 


-8. 


92-50.74 


22, 


,45 


0, 


,36 


6 


HMC 


157 


-1. 


,89-51, 


.43 


22. 


,05 


1817 


0. 


47-46.30 


20, 


.66 


1. 


,39 


2 


MMC 


110 


-22. 


,04-57, 


.91 . 


24. 


,02 


1547 


0. 


22-43. 14 


24, 


.82 


-0. 


,80 


8 



Note. Largest positive difference was assigned Rank 1. 
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Table 3 



Racial Differences in IJ;.^m Differentiation 
For 2b Selected Itemife o^^he ADJ3 Exam 



B Minus W 

Item iBlack White -D^ifference 





P Value 


D Value 
— s 


r 

-it 


P Value 


D Value 

~s 


T 

-it 


D ^ 

— s 


Z Test^ 


11 


21, 


,28 


7. 


.61 


. 330 


34 . 32 


25, 


.42 


. 287 


-17. 


81 . 


.865 


12 


31, 


,91 


19, 


.95' 


- . 021 


26.71 


11, 


. 10 


. 036 


8. 


,83 


-.359 


13 


42, 


,55 


23, 


.73 


.346 


58 .54 


20. 


,88 


. 143 


2. 


85 


1 . 3'9^ 


14 


° 34. 


.04 


41 


.12 


. 574 


73 . 45 


23. 


.39 


p 315 


17 , 


.73 


^1 J. W J. 


15 


34, 


,04 


7 


.07 - 


. 028 


38 . 51 


9. 


. 13 


. 013 


-2 , 


, 06 




16 


46, 


.81 


-1, 


.99 


. 176 


51 . 55 


35, 


. 18 




-37, 


, 17 


• 0 0 / 


17 • 


46, 


.81 


32 


.07 


^~ .228 


61 . 49 


23, 


,80 


, 241 


8, 


, 27 




18 


25. 


.53 


-1, 


.09 


- . 026 


29 . 97 


8. 


.45 




-9, 


.54 


• HO J. 


19 


27. 


.66 


36, 


.21 


. 126 


19.41 


17. 


.22 


• ±\j\j 


18, 


.99 




20 


78, 


.72 


34, 


.48 


. 108 


72 . 98 


29, 


,56 


. 243 


4. 


.92 




21 


21, 


.28 


r 1, 


,53 


.063 


23:76 


16. 


.68 


.074 


-15, 


. 15 


-.071 


22 


19, 


.15 


50. 


. 00 


.041 


36.65 


36,58 


.343 


13. 


.42 


-2.031* 


23 


34, 


.04 


25, 


.86 


.317 


47.36 


45. 


.53 


. 387 


-19. 


.67 


-.513 


24 


36, 


.17 


40. 


.42 


.140 


49.07 


32, 


.54 


.356 


■ 7, 


.88 


'-1.485 


25 


38, 


.30 


9, 


.96 


.^18 


42.86 


34, 


.56 


.340 


• -24, 


.60 


-2457* 


26 


42. 


.55 


57. 


.09 


.474 


49.22 




.60 


.272 


30. 


,49 


1.516 


27 


29, 


.79 


32. 


.76 


.387 


61 .49/ 


26, 


.00 


.266 


6. 


,76 


.871 


28 


34. 


,04 


16. 


.86 


.115 


44.88 


31, 


.50 


. 182 


-14, 


.64 


-.440 


29 


34. 


04 


-1, 


.15 


.201 


33.54 


20, 


.76 


.211 


-21. 


,91 


-.067 


30 


21. 


28 


\ 10. 

\ 


,54 


.314 


55.75 


54, 


,47 


.501 


-43. 


,93 


-1.449 



Differences greater than 25.00 are underlined. 



Significance of difference between two r^^ correlations tested using 

r to Z transformation ^—lU^ (Hays, 1963, formula 15.26.6). 

- ~ . ^ a(Zi-Z2j 

*Two-tail test, P-, < .05. 



Table 4 



Range of Median of Seven-Item Sets of Highest 
- D Values with Corresponding P Values 



Black 



White 



Rate 



I) Value 



P V^ke 



Group Range 



r 



ADJl 

BMl 

BTl 

CSl 

HMl 

MMl 



ADJC 

BMC 

BTC 

CSC 

HMC 

MMC 



Median 



Median of ■ Median of 
Highest 1 Item Total Test 



52.05 - 59.03 
37.58 - 45.61 
41.47 - 48.90 
40.05 - 48.22 
51.25 - 65.00 
43.89 - 51.04 



53.62 
41.54 
44.61 
41.97 
53.75 
45.16 



47.68 - 56.15 
38.24 -,44.97 
4?.78 - 53.61 
44.00 - 56.48 
46.38 - 51.43 
43.80 - 57.91 



53.14 

'39.09 

50.86 

48.00 

49.03 

45.13 
If 



ADJ3 


50.00 


- 64.73 


54.41 


42.55 


r34.04 


BM3 


40.36 


- 48.25 


44.72 


45.78 


(38.74 


BT3 


51.47 


- 69.92 


58.09- 


45.45 


to. 42 


CS3 . 


56.67 


- 72.73 


62.64 


44.44 


\4.44 


HM3 


40.52 


-.48.90 


■ 45.49 


48.08 


4V19 


MM3 


. 48.33 


-"51.67 


49.52 


44.83 


39.66 


ADJ2 . 


53.33 


- 75.00 


57.47. 


56.67 


36.67 


liM2 


41.54 


- 48.96 


44.15* 


37.84 


39.34 


BT2 


50.00 


- 72.31 


50.26 


50.00 


39.29 


(;s2 


49.09 


- 63.45 


49.82 


44.68 


42.55 


HM2 


37.40 


- 47.89 


38.62 


42.34 


41I44 


MM2 , 


49.32 


- 60.00 


52.78 

1 • 


43.33 


36.67 



44.00 
49.57 
53.16 
61.42 
50.00 

40.32 



48.86 
41.97 
51.45 
49.70 
45.22 
53.64 



U VdiUc 


' ■ . P Value 


i 


.Median of 


Median, of 


Range Median 


Highest 7 Items 


Total Test 


37.89 - 54.47 41.41 


53.42. . 


45.81 


34.88 - 44.55. 35.93 


53.44 


43.56 


38.10 - 50.38 41 .'98 


5^.48 


48.62 


36.35 - 50.05 39.86 


56.38 


' 49.05 


39.54 - 44.73 42.39 


58.43 


49.62 


44.63 - 50.87 46.58 


46.62 


48.34 



37.81 - 44.28 
33.36 - 39.65 
41.63 - 49.55 
32.36 - 37.34 
36.08 - 44.80 
40.51 - 50.03 



39.12 
37.83 
44.^15 
35.07 
38.39 
42.01 



49.91 
46.92 
47.95 
54.85 
50.18 
58.43 



44.00 
42.61 
45.57 
45.84 
42.31 

40.32 



43.33 - 52.23 
34.05 ^ 36.97 
43.91 - 48.84 
36.25 - 40.53 
35.04 -'44.82 

39.17 - 45.44 



43.73 
35.23 
47.09 
37.62 
38.74 

40.70 



53.25 
54.58 
55.35 
52.95 
30.77 
53.36 



44.32 
41.45 
51.45 
43.03 
46.57 
50.9r 



45.85 - 54.24 
34.24 - 39.94 
35.78 -42.13 
41.08 - 50.74 
39.35 - 46.30 
38.47 - 43.14 



48.72 
36.77 
38.80 
43.18 
40.94 
41.16 



51.68 

51.41 

50.94 

50.58 

46.18* 

54.04 



42.52 
43.24 
50.20 
45.63 
46,30 
49.54 



48.13 
45.32 

55.25 
50.53 
45.14 
48.45 



48.72 
43.70 
52.41 
42.93 
46.46 
33.04 



These results indicate that item differentiation would be improved 
for both Blacks and Whites by the construction of tests using items that 
are' generally easier, and particularly, with less concentration of items 
near the guessing rarvge., The results tend to support those of Tinkelman 
(1971^, who proposed value of ,75* as the optimum ' average i tem^dif f icul ty 

for items with four options, because the error var ianci^ diTe. to. chanc^ tends 
to be greater when guessing occurs^, 

Effects of Item Selection Procedures 

Table 5 presents, for five rate groups, the effects on mean scure, 
value, and JD value from employing two types of tests — SIM-P and UPA-P . 
(The median JD value of the SIM— P test is probably an overestimate, and 
that of the UPA-P test, an underestimate, because each is based on the 
remaining D Values, rather than r escor ing section scores and recalculating 
new values. ) Compared with the original operational tests (ORIG) , it 
was found that: 

1. The SIM--P tests substantially reduced Black-White differences in 
mean score and P^ value (e.g., for ADJ3 in Table 5, mean score differences 
were reduced from 17.58 to 3.35; and Value differences, from 11.8 to 3.9) 
in all five rate groups. However, median values, as a measure of test 
quality, were reduced in two of the five Black groups and four of the five 
White groups (e.g., for HM2, Black median value remained at 16.9; but 
that for Whites was reduced from 22.0 to 20.3). 

1, The UPA'-P tests produced slight and varied Black-White differences 
In" mean score and P^ value (e.g., for MM3 in Table 5, the mean score differ- 
ence changed from 9.96 to 9.86), but Black and White median ^ values all 
increased (e.g., BM2 Black group, from' 39.2 to 46.0). 

Table 6 compares the SIM-P, UPA-P, and SEQUIN types of tests with the 
original tests in regard to test reliability and Black-White mean difference 
The SIM-P tests reduced reliability substantially in some rate groups 
(e.g., for ADJ3, in the corrected r column for test length of 150 items, 

reliability decreased from .863 to .702), and slightly .in others (e.g., 
for BM2, from .7 29 to .7 26). The UPA-P and SEQUIN tests both increased 
reliability slightly. Thus, SIM-P type tests reduced Black-White differ- 
ences in mean score but at a probably unacceptable cost in.reduced test 
quality for both Blacks and Whites. (The results of the present study, 
using test quality measures of item differentiation and reliabiLity, 
provide empirical support for the conclusion of reduced test quality 
reached in the Robertson and Royle (1975) study.) The effects of UPA-P 
and SEQUIN tests on Black-White mean score differences are slight and 
varied. Test quality (i.e., reliability) usually is increased slightly. 
Such increases in reliability occur most likely because the reliabilities 
are already quite high — usually in the high .80's. In the one exception, 
BM2, there is a modest increase from the relatively low .729 to .764 
(for UPA-P) and .769 (for SEQUIN) . 
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Uy K.icc on Throe Types of Tests 



Rate 

Group 



HM2 



Type 
TestL 



Black 



B Minus W Difference 



X 

Total 



Median 
P Value 



Median 
D Value 



X 

Total 



Median 
P Value 



Mediari 
D Value 



X 

Total 



Median 
P Value 



SIM-P minus 

ORIG 
UPA-P minus 

ORIG 



.91 
9. 15 



2.0 
6.8 



0 

2.8 V 



-2.09 



-1.7 



b, d 

c, d 



ORIG 
SIM-P 
UPA-P'' _ 
SIM-P minus 

ORIG 
UPA'P minus 

ORIG 



-63.60 
67.83 
70.84 

4.23 
7. 24 



41.4 
4S.0 
45. 1 

3.6 
3.7 



16.9 
16.9 
19.0 

0 
2.1 



^Includes the complete set of 150 items, 
b 



Median 
D Value 



ADJ3 


orig\ ^ 

UPA-P"'^ 


52.38 
55.16 
60.19 


34.0 
36. -2 
38. 3 


22.8 
20.9 
24.4 


69.96 
58.51. 
Z7.52 


45.8 ' 

40.1 

51.3 


24.8 
21.1 
24.8 


-17.58 
-3.35 
-17.33 


-11.8 
-3.9 
-13.0 


-2.0 
-0.2 
-0.4 




ORIG 


2.78 


2.2 


-1.9 


-a. 45 


-5.7 


-^3 .7 










UrA-P minus 
ORIG 


7. HI 


4.. 3 


1.6 


7 .56 


5.5 


0 








HM3 


ORIG^ . 
SIM-P^'^ 
UPA-P"'^ 


68.00 
72.35 
76.10 


45. J 

48. 1 , 

49. 0 


19.5 
19. 5 
20.3 


73.45 
74.39 
81.34 


49.6 
50.3 
53.1 


23.5 
22.7 
24.6 


-5.45 
-2.04 
-5.24 


-4.4 
-2.2 
-4.1 


-4.0 
-5.2 
-4.3 




SIM-P minus 
ORIG 


4.35 


2.9 


0 


. .94 


.7 


-.8 










UPA-P minus 
ORIG 


8.10 


3.8 


.8 


7.89 


3.5 


1.1 








mi 


orig\ . 

SIM-P^'^ 
UPA-P^' 


62.48 
66.59 
67. 18 


39.7 
41.4 
41.4 


21.0 
20. 1 
, 23.3 


72.44 , 

68.99 

77.04 


, 48.3 
44.6 
50.3 


24.6 
22.7 
2S.8 


-9.96 

-2.4 

-9.86 


-8.6 
-3.2 
-8.9 


-3.6 
-2.6 
-2.5 




SIM-P minus 
ORIG 


4.11 


1.7 


-.9 


-3.45 


-3.7 


-1.9 










UPA-P minus 
ORIG 


4.70 


1.7 


2.3 


4,6 


2.0 


. 1.2 








BM2 


« a 
ORIG . . 

SIM-P^'^ 

UPA-P^' 


60. 12 
61.03 
69.27 


39.2 
41.2 
46.0 


21.6 
21.6 
24.4 


63.43 
61.54 
72.24 


43.2 
41.5 
48.0 


21 .1 
21.1 
22.2 


-3.31 
-.31 
-2.97 


-3.9 
-0.3 
-2.0 


0.5 
0.5 
2.2 



8.81 


4.8 


1.1 












70.27 


46.3 


22.0 


-6. 


,67 


-4.9 


-5. 


, 1 


69.67 


45.8 


20.3 


-1. 


,84 


-0.8 


-3. 


.4 


76.52 


49.9 


22.3 


-5, 


.68 


-4.8 


-3, 


.3 


-.6 


-.5 


-1.7 












6.25 


3.6 


-.3 













Includes only- items in which the BlacK P value was not si^ificantly less than than the^ite P value. 



"^Includes only items in which the Rlack P value was greater than .25. 

''Mean total scores are simulated by obtained SIM-P or UPA-F score times ^ ^tems in simulated telt 



N items in original test 



ERIC 



2, 



12 



^ Table 6 

RellablllCy» Mean» and Standard Deviation of Four Types of Tests 



Rate Group 


Type 




Reliability 







Black 






White 






and N 


Test 




0 
— XX 




X 


SO 






• ,sn . 


0 


D isCK/ nni te 




I tems 


Obt . 


Cor , 


















Ul t , 




ORIG 


150 


-* .863 


.863 


52, 


,38 


r 

12!, 


.60 


69. 


96 


14. 


.75 ' . 


-1 ,285 




s1m-p 


74 


.538 


.702 


27. 


,21 


' s! 


. 18 


28. 


86 


5, 


.89 


-0.298 




UPA-P 


114'* 


. 0 


. ODD 


45. 


,75 


n. 


.49 


c o 

DO . 


92 


11 , 


.78 


1 1 1 
- 1 . 1 J 1 




Spoil IN 






. O / D 


44, 


,66 


11, 


.21 


0 u . 


24 


13, 


.'19 


1 oil 


ADJ3 






- 


.161 




















47/644 


no T p 
























urA-K minus 






.002 
















































SEQUIN minus 
ORIG 






.012 






















ORIG 


149 


.870 


.870 


68, 


,00 


ni. 


. 17 


p 

73. 


,45 


15 


.53 


-0.408 




SIM-P 


115 


.829 


.863 


55, 


.85 


8 


.47 


57. 


4 1 


11 


.85 


-0. 15!^ 




UPA-P, 


126 


.867 


.885 


64, 


.36 


10, 


.18 


68, 


,79 


14 


.45 


-0,359 




SEQUIN 


125 


.868 


.887 


59 


.69 


10 


.69 


64, 


,55 


14 


,48 


-0,386 


HM3 


SlM-P minus 


























104/1429 


ORIG 






,007 





















UPA-P minus 

ORIG 
SEQUIN minus 

ORIG 



.015 
.017 



58/1259 



' ORIG 
SIM-P 
UPA-P 
SliQUIN 
"^M-P minus 

ORl*G 
UPA-P minus 

ORIG 
SEQUIN minus 

ORIG 



ISO 
95 
133 
125 



.033 
.006 

)013 



BM2 

74/569 



884 - 


.884 


62, 


.48 


12, 


, 2t 


72, 


,44 


■ 16. 


,56 


-0.691 


784 


.851 


42. 


, 17 


7. 


,48 


^ 43, 


,69 


9, 


,78 


-0. 176 


878 , 


.890 


59, 


.S7 


11, 


, 52 


68. 


,31 


15, 


,48 


-0,647 


879 


.897 


53. 


,21) 


10. 


, 7S 


62, 


.29 


15. 


.07 


-0.697 



ORIG 


150 


.729 


.729 


60. 


, 12 


11. 


,70/ 


63. 


.43 


10, 


,56 


-0.297 


SIM-P 


126 


.690 


.726 


51 . 


,28 


"9. 


, 98 f 


51, 


,55 


9. 


,08 


-0.028 


UPA-P 


119 


:.720 


.764 


54. 


,97 


10. 


55 \ 


57. 


,33 


9. 


,67 


-0.253 


SEQUIN 


125 


.'.7 36 


.769 


53. 


,25 


11, 


, 11 


56. 


,64 


10. 


, 02 


-0.522 


SIM-P minus 
ORIG 






- . 003 




















UPA-P minus 
ORIG 






,035 




















SEQUIN minus 
ORIG 






.040 




















ORIG 


149 


.820 


,8.>0 


63. 


,60 


9. 


,43 


70.27 


13. 


,40 


-Ov584 


,S,IM-P 


1 09 


.710 


.77 1 


49. 




(7. 


, 02 


5 1 . 


,00 


8. 


,92 


-0. 168 


UPA-P ^ 


12S 


. KOO 


.827 


50. 


43 


8, 


, 57 


64. 


,57 


1 1 . 


84 


-O.SOi 


SEQUIN 


12S 


.814 


.84 0 


53. 


41 


8. 


70 


59, 


,52 


12, 


,5b, 


-0,5^1 



HM2 

111/1391 



SIM-P minus 

ORIG 
UPA-P minus 

ORIG 
SEQUIN minus 

ORIG 



.049, 



.007 



.02 0 
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Items remaining after delation. 
^Obtained (Obt.) value. 

'^Corrected (Cor.) value tor a tost of ISO, items (Nunnally, 1967, l-oimula 7-6, p. 225). 

d X - X 

Xq Difference- -the mean difforoiue in standard deviation uuits, caU'Ulaled by -~ 

Calculated on Hlack and Whito groups combined. 2 



Effects of Exam Construction and Processing Procedures- 



When the Performance Factor was employed as representative of an 
external, job^relevant criterion, the SEQUIN procedure reached a maximum 
validity with a small subset of items. For the ADJ3 Exam, the value of 
the validity coefficient rose rapidly to a maximum of ,206 with the selec- 
tion of the 20 most valid items (see Figure la); then tapered off to a slight 
negative validity of -.031 for all 150 items. Similarly, for the BM2 
Exam, the validity coefficient reached a peak of - 273 for 3.0 items, and 
a final value of .016. Compared to the validity coefficient, the value 
of the reliability coeffieient, which is larj^ely a function of the number 
of items in a test, continued to rise steadilly (see Figure lb) during the 
selection of the first 100 items and leveledi off with the ■ selec tion of the 
"best'' 120 items. 

Since SEQUIN also identifies the specific items selected in the ''accre-- 
tion" process, it was possible to categorize items according to content and 
compare items i^elected early and late in the process. In the selection of 
Items Itom the ADJ3 Exam (see Table 7), twice the proportion of theoretical 
items occurred in the last 25 (i.e,, least valid) items as in the first 25 
(i.e., most valid), although this 16 percentage point difference was not 
significant when a chi square test was applied. 

Comparing the A1)J3 Exam items selected by both an internal and an 
external criterion, items with the 14 lowest item-total correlations were 

identified ( r , < .050). With the internal criterion, 11 of the 14 items 

^It — 

were among the last third of the items ^to be selected (see Table 8), 
However^ with the external criterion (the Performance Factor), 12 of the 
14 items were in the first third of the items selected* Particularly, 
three of the items with both a very low value and r^j,^ value werev among 

those selected earliest — flLtl^ seventh, and thirteenth- by the external 
cr iter ioru 

Similar lesult.s were ohtaduKl on t hc^ BM2 i:xam (see 'L'able 9) . iwelvc 
of the 15 itcnus with tht^ Lowest item-total correlations ^/ere among the 
first third of Items selected hy tl)e extc^rnai e'riterion, with six of those 
items nniong t lu^ t Lrst 24, 



ERIC 




,90 "I 
.80 
70 
>> .60 

=i .50 
^ .40 



.30 
.20 
.10 
0 



b. RELIABILITY 




.0^' - 



"T 1 \ 1 T" 

15 20 25 35 50 70 
NUMBER OF BEST ITEMS 



-p. 

90 



120 150 



'•l).VMt^ 1. I 1 Uu;l r.i t i cni ot select ion oi most valid 



\ t tMiis by 



ERIC 



IS- . 



r 



Table 7 



Proportions of Theoretical and Applied Type 
Items in 25 Most and Least Valid Items 
Selected by SEQUIN (ADJ3 Exam) 



Item Content 
Category 



Items Selected by SEQUIN 



Most Valid 
25 Items 



Least Valid 
25 /[terns 

N ' % 



All Items 
(150) 

N 



% 



Theoretical 


4 


16 > 


8 


32 


32 


21 


Applied 


20 


80 


16 


64 


110 


73 


Indeterminant 


I 


4 


1 


4 


8 


5 



Note. For a 2 x 2 Matrix of only those items identified 

^.2 



as theoretical or applied. 



4 


8 


20 


16 



1.0, ,50 > p > .25, 



Table 8 



Comparison Between Internal and External Criteria 
Of SEQUIN Item Accretion of Lowest Item- 
Differentiation Values (ADJ3 Exam) 









Sequence in which Item 


was Selected by: 


ft em P 


Value 


Lowest r. 

—It 


Internal Criterion External Criterion 


No. 




i± .050) 


(Total Score) (Performance Factor) 


120 


.333 


.028 


61 


46 


105 


.372 


.043 


65 


62 


15 


.382 


.020 


92 


16 


128 


.423 


.034 


103 ^ 


45 


135 


.425 


-.009 


— 104 


17 . 


118 


.195 


.041 


106 


39 








109 


4 


71 


.124 


-.022 


118 


7 


55 


.534 


-.043 


123 


117 


18 


.297 


.050 


128 


34 


45 


.161 


-.022 


136 


13 


97 


.465 


-.054 


142 


37 


12 


.271 


.023 


147 


33 


113 


.100 


-.030 


150 


5 


Note 


• N = 


691 (47 Black 


and 644 White combined). 





Values are slight overestimates, since item is included in total 
score. 
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Table 9 

Comparison Between Internal and External Criteria 
Of SEQUIN Item Accretion of Lowest Item- 
Differentiation Values (BM2 Exam) 



Sequence in which Item was Selected by: 



Iter» 
No. 


1 t Value 

V~ - 


Lowest r. . 
— It 

(<. .050) _ 


Intemal Criterion 
(Total Score) 


external uricerion 
(Performance Factor) 


93 


.661 


. 039 


52 


34 


115 


.295 


.024 


59 


22 


60 


, 199 


. 003 


99 


107 


5 


. 591 


, 007 


100 


76 


139 


. 212 


-.019 


110 


43 


.37 


.292 ' 


.041 


115 


14 


98 


.215 


-.060 


132 


32 


107 


. 104 


-.037 


137 


31 


18 


.267 


-.019 


138 


136 


131 


.117 


; -.090' 


143 


6 


81 


.152 


.031 


145 


115 


19 


.093 


-.020 


147 


24 


130 


.070 


.025 


148 


15 


123 


.065 


-.083 


149 


62 


73 


.1)59 


-.048 


150 


16 




Note. N = 
^Values are 


643 (74 Black and 569 White combined) , * 
slight overestimates, since item is included in total 



score . 
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DISCUSSION 

Procedures for Improving Advancement Tests 



The problem of haw to Improve enlisted advancement exams Is discussed 
in the light of the results reported above, the reality of the administra- 
tion and use of the tests, and the desirability of achieving one or more of 
three -objectives — (1) Increasing test reliability , (2) Increasing test 
validity, and (3) decreasing Black-White score differences. It is, of 
course, easier to state an objective than to achieve It. Even when the 
rules of good Item construction are followed, there Is no assurance that 
the Item characteristics desired will be achieved, unless the Items are 
pretested. Nunnally (1967) suggests pretesting at least twice as many 
Items as are Intended for the final test. Although such a procedure may 
be Ideal, there are practical limitations In regards to the development 
of Navy enlisted advancement exams. Advancement Is Intensely competitive, 
particularly In the higher paygrades where the proportion of openings Is 
much smaller than the proportion of highly qualified candidates available. 
If Items were pretested on a/ sample group, the examinees In the sample 
group might have the advantage of being alerted to the specific content 
of the forthcoming exam. Also, the _P values would probably be lower In 
the pretest than In the operational test, since the pretest examinees would 
not be motivated to study as Intensely as they would for the operational 
test. 

In lieu of a pretesting procedure, the tests could be Improved by 
the employment of four other procedures: 

1. Test validation on an external, job-relevant criterion. 

2. Identification of the most and least valid items, and a content 
categorization of the items identified. 

3. Utilization of item construction procedures that tend to produce 
items with the desired characteristics (e.g., havi^hK specified levels 

of item difficulty, differentiation, and validity). | 

4. Post hoc item deletion procedures that eliminate undesirable items 
after administration but prior to. final scoring. 

Each of these four approaches is discussed in detail below. 

Test Validation 

The primary concern with a personnel selection test is, of course, 
Its relevance to the purpose of the selection — in the present case, to 
the individual's effectiveness in the next higher grade for which selected. 
The measures of test quality investigated in the present study — test 
reliability and item differentiation — are important to test validity (by 
setting upper limits on it) but do not of themselves assure test validity* 



Validation of the advancement exams on job-relevant criteria is 
needed for two reasons. First, the courts are becoming increasingly in- 
sistent on empirical evidence of the job relevance of personnel selection 
procedures in compliance with the Civil Rights Act of 1964. Second, CNO 
Objective Number CNO-1, entitled Retention of Career Personnel (of 
September 1974), is not addressed to the retention of personnel in 
general, but rather, to the retention of top quality career personnel. 
The demonstration of top quality certainly is largely a function of an 
Individual's effectiveness on the job,* and motivation to reenlist is 
certainly heavily influenced by advancement success. — ^ 

Highly effective validation procedures are available that would be 
responsive to* the above two requirements. The SEQUIN procedure, which was 
demonstrated with an illustrative job-relevant criterion, was shown to be 
quite useful, not only to maximize the validity of a test using a subset 
of items but also to identify the specific items, which contribute to, and 
distract from, prediction of the criterion behavior. 

Identification and Categorization of Valid Items 

Since SEQUIN identifies the specific item selected in the "accre- 
tion" process, it also provides test makers with the capability to analyze 
and categorize the content of each item. With this knowledge, certain 
"mixes" of various categories of items could be considered in the construc- 
tion of future tests. For example, there might be an optimal ratio of 
theoretical to applied type items for maximum job-relevant validity. The 
difference between proportions of theoretical and applied items in the 
first and last 25 items selected in the ADJ3 Exam was not significant. 
However, with larger pools of items (e.g., the first and last 50-item sub- 
sets from a number of exams of similar occupational specialities), signi- 
ficant differences might be identified. Also, categories other than 
theoretical-applied might be studied, such as the differential validity 
of the content of the subtest sections . 

Item Construction Procedures 

.In the reliability analysis of five rate groups (see Table 6), 
the reliability of the BM2 Exam, .729, was substantially below that of 
the othor four groups. This result might be a function of either item 
statistical or structural characteristics. For example, the median P^ 
value (see Table 4) and D value (see Table 2) of the BM2 White group are 
relatively low among all White groups. (Since the Black and White groups 
of each rate group were combined to calculate the reliability, the obtained 
value reflects primarily the distribution statistics of the majority White 
group.) 

Although the literature abounds with guidance for item writing, 
many of the rules have not been adequately evaluated empirically. In one 
empirical demonstration of undesirable item characteristics, Dudycha and 
Carpenter (1973) found that: 
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1, An inclusive distractor, such as "all (or any or none) of 
the above" (as opposed to a specific distractor, which is a specified word 
or phrase) reduces item differentiation. 



2. A negative stem structure, which includes the word "not" 

(as opposed to a positive stem structure, which does not) increases item 
difficulty. 

3. An open-stem structure, which requires the answer to complete 
the sentence (as opposed to a closed-stem structure, which is a complete 
sentence) increases item difficulty. 

4. The combination of open-positive stems and closed-negative 
stems in .the same test reduces item differentiation. 

It was observed that all four of these item designs are used with varying 
f requencydn the present advancement examg, particularly in the BM2 Exam. 
It would thus be useful to determine whether the, use of these (^nd perhaps 
other) structures contributes to undesirable item characteristics (e.g., 
reduced values or D values) . 

Also, median values and values would probably be increased by 

raising the criterion values for reuse of items (e.g., £ values no less ^ 

than .30 or greater t^han .85, and r. with item in score, no l^ss than 

—It 

.05) but subject to item validity with an external criterion. 

Post Hoc Item Deletion Procedures 

Although pretesting of items is probably not feasible, applica- 
tion of item deletion procedures which eliminate undesirable items (e.g*,^ 
those with extreme high or low P^ values, or low differentiation values) 
subsequent tor administration but prior to final scoring for selection 
purposes might increase the reliability or validity of the exams. The 
SEQUIN accretion procedures described above demonstrated that a subset 
of items could be selected that yields a higher validity than, and an 
equally high reliability as, the total set of items. However, these resul 
should be considered tentative, because the procedure capitalizes on the 
intercorrelations of the sample data, and is thus influenced by chance. 
Cross-validation is necessary to ensure that the results are not an effect 
of sampling error (Henryssen, 1971). 

The selection of items to increase reliability will usually terid 
to increase validity (Henryssen, 1971). However, if excessive emphasis 
is placed on increasing test homogeneity, the test may become too narrow 
and one-sided in content to have high validity. In the SEQUIl^ demonstra- 
tion with, the ADJ3 and BM2 Exams, many of the items with the lowest item- 
total correlation were selected by an internal criterion near the end of 
the accretion process, but by an external critjer Iot\ near the beginning. 

A number of reasons might account for these results (other than 
that the use of the present Performance Factor as an external criterion 
may not have been appropriate, even for illustrative purposes). If the 
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test content tends to be heterogeneous, rather than homogenous, as 
suggested by some of the low intercorrelations among section scores, then 
internal consistency type measures of reliability may be of limited rele- 
vance. This possibility is suggested by a comparison between the reli-* 
ability and validity coefficients of the ADJ3 and BM2 Exams. Although 
an internal consistency type measure of reliability places an upper limit 
on the validity of a test, the situation only applies with hombgenous 
tes.ts. However, with a heterogeneous test , elimination of items with low 
item-total correlations could result in the reduction of predictable 
variance. It may be observed that the reliability of the BM2 Exam is 
lower, but its validity is higher than those of the ADJ3 Exam. Also, 
when the correlation between two tests is near zero or slightly negative* 
(as is the ADJ3 Exam with the external criterion), the items that correlat 
J.owest with total test score (i.e., the lowest r^^^ values) could very well 

be those that correlate highest with an external criterion. 

Balancing Item Biases 

Another issue pertains to the question of the compatibility of the 
two objectives identified by the Chief of Naval Personnel to be investi- 
gated — the feasibility of compiling "tests Composed of questions having 
identical or correlatable degree of difficulty (Rho) factors for both 
Blacks and Whites," The Robertson and Royle (1975) study was addressed 
to the first objective, "identical" difficulty; and the Robertson and 
Montague (1976) study, to the second, "correlatable" difficulty. The 
present study addressed both objectives in the context of item differen- 
tiation and test reliability. 

Both the Robertson and Royle (1975) and the present study found that t 
construction of tests of items of similar difficulty — from the existing 
pool of items — was not feasible. The question might be raised as to the 
existence of, or the possibility of developing, items on which Blacks 
are superior. If such items were found, tests might be constructed with 
^a "balance" of items in which Whites do well on some, and Blacks, on other 
Ironically, such tests would result in ^increafsed racial bias, as measured 
by a decrease in relative item difficulty (Rho value). (The issue of 
"balancing" item biases is discussed briefly by Cleary and Hilton (1968) 
and by Jensen (1973).) 

Implications of the Results 

The demonstrations of improved item differentiation by eliminating 
excessively difficult items and items wi,th low or negative differentia- 
tion, suggest the need to implement the item-deletion and item-construc- 
tion procedures discussed. Such procedures would result in a slight 
decrease in mean score differences between Blacks and Whites and, in terms 
of test quality, a slight Increase in item differentiation for Whites and 
a moderate increase for Blacks. Also, any procedure that would raise the 
levei^ of jP values would reasonably be expected to reduce the proportion 
failed by the exam cut-^score, thereby enabling those who passed to con- 
tinue to compete On their other advancement factors. Although such a 
procedure was not demonstrated in the present study, it is of particular . 
interest and advantage to Blacks. 



However, the SEQUIN demonstration, in which the items selected were 
compared by internal and external criteria, also suggest -that items deleted 
to 'Increase item differentiation or test homogeneity may be the types of 
items that best contribute to predicting job-relevant performance by an 
external criterion; Thus, until external validation studies are performed 
to determine the relationship of test heterogeneity to subsequent perfor- 
mance in the grade to which advanced, recommendations to implement the pro- 
cedures discussed above are deemed premature. 
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■ CONCLUSIONS 

1. Enlisted Advancement Exam item differentiation and internal con- 
sistency type test reliability could be improved for both Blacks and Whites 
by using item selection and construction procedures identified, developed, 
or demonstrated in this study, 

2. The development of tests in which only the items similar in difficulty 
for both Blacks and Whites are used is not feasible because it would reduce 
test quality. However, the elimination of e:trcessiv6ly difficult items, 

by either alternative item construction. or. post-administration item dele- 
tion procedures, would improve test quality and, in particular, benefit 
Blacks, because the proportion of candidates failed t!y the exam cut-score 
would be reduced, thereby enabling those who passed to continue to com- 
pete on their other advancement factors. 

3. The two objectives that were identified for investigation in the 
present series of studies — the feasibility of compiling "tests- composed of 
questions having identical or correla table degree of difficulty • . . 

for both Blacks and Whites" — may not be compatible. As stated^above, 
construction, of tes^:s of only items of "identical" difficulty, at least 
from the existing pool of items, was not feasible. Using "balanced" items 
might be an alternative to items of "identical" difficulty. However, even 
if new items could \>e developed on which Blacks were super ior , and tests 
then constructed with a "balance" of items in which Whites do well on some 
and Blacks on others, such tests would be characteristic of reduced "cor- 
relatable" degree of difficulty. Thus, the use of a measure of relative 
item difficulty as an indication of possible racial bias appears to be of 
limited relevance in a study directed towards identifying effective pro- 
cedures to* provide all racial groups with similar opportunities for advance- 
ment. ^ 
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REC0MMEba)ATIONS 



The fundamental question regarding racial differences in advancement 
should pertain to the relationship of each selection 'factor, Including 
the present Technical Knowledge Exam, to subsequent job-relevant performance 
in the grade to which selected. The results of the final phase of the 
present analysis raise important new questions regarding differences 
between the "best" items selected by an internal and an , external criterion. 
Thus, implementation of the procedures discussed or demonstrated in the 
present study (which was at the exploratory level of research), prior 
to addressing these newquestions, would be premature. 

It is recommended that: (1) the empirical validity of the present 
tests' on subsequent performance be compared between Blacks and Whites, 
and (2) the alternative item processing and item construction procedures 
discussed in the present study be validated and compared on internal and 
external criteria. , 
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' APPENDIX 
MiUTHODOLOGICAL ISSUES IN ITEM ANALYSIS 



'The calculation of item-difficulty and item-differentiation indices 
for a large number of tests With large subject pools permitted investiga- 
tion of methodological questions as well as the study of racial group 
differences. 

A number of computational approaches may be used in determining item- 
differentiation using the item-total relationship, including the r^^ and 

D value^ techniques employed in this study. These and other alternative 
procedures provide much the same information. ; The rankings^^of item-differ- 
entiation values by alternative procedures usually yield correlations 
among the ranks in the .90*s (Nunnally, 1967).^ In computing item-differ- 
entiation statistics, if the item itself is included in the total (or 
section) score, some portion of the correlation value obtained will be 
an artifact from the presence of the item itself (Nunnally, 1967). 
(obviously, the size of this artifact will vary Inversely with the number 
of item^ in the test /section. ) Also, if a test contains subtests (i.e., 
'^sections") of differing content (i.e., a nonhomogenous type test), it 
may be more appropriate to compare item responses with the subtest score 
than with total score. 

Alternative Item Analysis Procedures^ Employed 

To investigate the effects of including the item in the total score 
and of computing item-differentiation sirafcistics on sections vice total 
test scores, the following alternative statistics were computed: , 

L.4 (w/ item) — item-section correlation, with the item included 

'.IS ^ ^ \ 

in the section, score. \ 

^2. t;^^ (w/o item) — item-section correlation, without the item in- 
cluded in the section score, 

^* (^/ item) — item-total correlation, with the item included 

in the total score. 

^* ^it ° item) — item-total correlation, without the item in- 
cluded in the total test score. 



' ^The JD value of the present study is to be distinguished from the 
Lawshe (1942) ]) value, adopted from the Kelley (1939) technique, which 
expresses the difference between the two scoring groups in terms of 
sigma units* 
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llj, value (w/ item) — percentage difference between high and low 
section scorers \^ho answered the item correctly. 

6. ])^ value (w/ item) — percentage difference between high and low 
total scorers who answered the item correctly. 

values (hereafter referred to as values) were calculated on all 

Items for all 24 rate groups employing the above procedure 5. Although 
this procedure produces values that are overestimates from the presence 
of the item Itself in the section score, it vas considered useful for the 
present analysis, since the primary interest concerned the relative size 
of the values between Blacks and Whites, rather than the absolute size 
of the J) value. ' 

Iritorcor relat Ions among sect Ion and- total test gcores were calculated 
for four selected rate groups. 

Effects on Item-Test Correla_tion From Including Item in Score 

^able A-1 presents item-score point biserial correlations for all 
four alternative responses for seven selected items of the HM3 Exam, 
calculated., both with and without the item included in the score. The 
correlations between each alternative item "response and test scori^ wer^ 
found to be higher when the Item was included in the score thanTmen it 
was not Included. This finding is consistent with discussions in the 
general literature (e.g. > Nunnally, 1967). Inclusion of the item in the 
sec tion score frequently increases substantially the r_ of the correct 

response^ alternative (e.g., for Item 2 alternative jj, from ,211 to, .424 
for Blacks, and from .095 to . 379 for Whites). Inclusion of the item 
In total score, however, usually increases the iL^|-^y only .02 to .04 

correlation points (e.g., for Item 130 alternative 1, from .235 to .275 
for Blacks, and from .188 to .219 for Whites). The increase in L^^9 from 

inclusion of the Item in tlie section score, is greatest in the lowest 
Xj^g values without tlie item (e.g., for Whites, from .055 to. .215 in Item 

150, compared 7^/1 til .391 to . 449 in Item 30), although the difference In 
JL^^' slight (e.g., for Whiten, from .003 to .046, a difference of .043 

in Item 150, compared wltli a difference of .049 in Item 30). 



In calculating JD values, a similar procedure could have been applied 
by dividing the group Into high and low scorers for each Item on the 
basis of their score without tliat item Included. Tliis lengthy procedure 
was not applied, tiierefore all obtained values can be considered to be 
overes.t imates . 



Table A-1 

Comparison of Four Methods of Calculating Item-Score Correlations 
On Seven Selected Items of the HM3 Exam 



' 0' 

> 

1% Black ' " White 



> 
I 



Item 
No. 


" 

<D 0 
fJ 0 

H If) 


0 ^- 

Ph 0 

a: < 




Section (r, ) 

-IS 






lOlal 








Section (r, } 

MS 


1 — ■ 




total (r.J 
-It 




1 


2 


3 


4 


1 


1 


3 


4 


1 
1 


2 


3 


4 


1 
1 


2 


3 




'2 


w/ 




009 


-431 


424 


130 


-019 


-155 


160 


057 


-060 


-308/ 


37a_, . 


-046 


'029 


-101 


' Hi 


024 


w/o 




039 


-280 


2n 


1 C 0 

168 




1 in 


IOC 


UOl 




-083 


095' 






-077 


082 


028 


20 


w/ 




310 


-044 


-274 


-130 


289 


030 


'268 


-173 


486 


-202 


-406 


-037 


437 


-174 


-370 


ATT 

-032 


w/o 




216 


-009 


-219 










* 101 


4?7 


-181 


-368 


-Oil 


411 


-165 


-554 


-021 


30 


w/ 




-219 


507 


-153 


-147 


-114 


212 


-207 


-014 


-151 


449 


-198 


-322 


-150 


417 


-181 


-297 


w/o 




-187 


216 


-107 


-095 


098 


1/0 








,391. 


-164 




-14? 

J. Hi. 


392 


-166 


-282 


80 


w/ 




028 


-128 


257 


-237 


-036 


-177 


314 


-257 


-086 


-070 


332 


-264 


-081 


-077 


2n 


-210 


w/o 




051 


-121 


116 


-109 


-030 


-176 


275 


-221 


-074 


-064 


233 ^ 


-177 


-077 


-075 


243 


-186 


90 


w/ 




-017 


-119 


m 


-024 


-036 


-111 


136 


-038 


026 


-060 


253 


-206 


039 


-074 


221_ 


-170 


w/q 




004 


'■076 


-024 


075 


-030 


-098 


094 


-009 


042 


.-021 


U7 


-140 


043 


-064 


192 


-151 


130 


w/ 
w/o 




376 
248 


-103 
-040 


-168 
-120 


-173 
-139 


275 
235 


-199 
-181 


-002 
014 


-108 
-097 


313 
198 


-132 

.082 


-155 
-113 


-110 

-058 


219 
188 


-085 
-072 


-127 
-115 


-068 
-054 


150 


w 




265 


-031 


-200 


-031 


255 


-014 


-071 


-162 


215 


-128 


-on"" 


-021 


,191 


-145 


-021 


033 


w/o 




US 


053 


-190 


-018 


209 


001 


-068 


-161 




^ -036 


003 

0 


-002' 


.172 


-134 


-019 


035 



Note. Correct response is underlined. Decimal points of point biserial correlations have been omitted. 
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Comparison of Item Differentiation by Section and Total Score 



Ak expocLod, D values were found to be higher than D values. As 
^5 — t 

illustrated with 15 selected BM2 items in Table A~2 . Black D values 

— s 

exceeded JD^ values by 4 to 41 percentage points with four exceptions 

(e.g., in Item 10, the value was lower by about 10 points). Also the 

rank order of Item differentiation varied considerably both by method 
(D and D) and bv race. 

Table A-3 presents the item-score correlations, of the correct response 
only, for 13 items (including the 7 items in Table A-1) from the HM3 Exam, 
along with corresponding D values and £ values. ^ The' ranks (among the 13 
items) of alternative item-differentiation values are quite similar across 
method (e.g., r^^^ and D^, r^^^ and r^^, etc.) when both methods include 

the item in the score, and when both methods exclude the item. However, 
the ranks vary when one method with the item included is compared with 
another method with the item excluded. For example, on Item 110, the White 
group ranks for r^^ (rank 11) and (rank 12), with the item in the score, 

are nearly the same compared to the r^^ rank without the ^item (rank 6). 

Of particular interest in Table A-3 is the comparison between r, and 

— is 

values (without the item included in the score). If the total test 

contains section of differl^jvg content, use of r^^ may be more appropriate 

than r_^^ (as discussed on page A-1) . Tables A-A and A-5 present intercorre- 

lations among section and total scores for two exams. For example, on 
the HM3 Exam (see Table A-4), section-section correlations range from 
-.011 (sections 1 and 6) to .431 for Blacks, and from .019 to .648 for 
Whites. Section-total correlations range from .363 to .814 for Blacks, 
and from .370 to .904 for Whites. (The section-total correlations are 
spuriously high, since the section is included in the total score.) ' 



'^The measure of item-difficulty employed in this item-analysis was 
the £ value, the percentage of a group which answers the item correctly 
(i.e., as defined by Tinkelman (1971, ^p. 62), the lower the £ value, the 
more difficult tiie item). TIds measure is to be\t dist inguished from an 
alternat Ive jyeasure of item-difficulty, Delta value, designated by the 
Greek letter "/\/* and characterized by higher values associated with 
more difficult Items. This latter measure employs "transformed criterion- 
scores" of the persons attempt inp the item and is particularly appropriate 
In tests measuring speed of performance (Conrad, 1948). Because both 
Blacks and Whites tend to complete the entire test, the simpler value 
was used In the present analysis. 
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Table A-2 



Comparison of Two Methods of Calculating ' 
Item Differentiation of 15 Selected 
Items of the BM2 Exam 





On 


Section 


Score 






On 


Total 


Score 






Item 


Black 


White 




Black 






White 




NO , 


D Value 
— s 


Rank 


D Value 

~s 


Rank 


Value 


Rank 


Value 


Rank 


1 n 

1 V 


11.23 


114 


. / 1 


11 ' 


21 , 


.54 


/CD 


12. 


77 






9.53 


119 


11, / () 


r o 

So 


15, 


.60 


52 


14. 


96 


A C 




32.46 


32 


Id. 


1 no 


13, 


.55 




5. 


96 


111 
ill 




7.89 


125 


o T c n 


b3 


11, 


.28 




19. 


18 


1 Q 




33.55 


29 


bo • bD 


D 


6, 


.45 


1 no 


24. 


13 


c 

O 




9.67 


118 




13b 


-1, 


, 17 




-1. 


78 




70 


11.79 


109 


13.26 


114 


17 


.22 


44 


15. 


97 


38 


80 


11.31 


113 


17.40 


9j 


4 


.76 


112 


2. 


58 


131 


90 


9.23 


120 


10.40 


129 


9 


.38 


84 


7. 


46 


97 


100 


15.31 


96 


15.31 


105 


o 




82 


3. 


51 


126 


110 


38. 14 


13 


15.27 


106 




47 


19 


14. 


01 


52 


120 


12.^3 


105 


21.57 


71 


7 


. KjI. 


97 


11. 


09 


74 


130 


11.40 


111 


5. 14 


146 


4 


.-25 


116 


2. 


35 


134 


140 


40.77 


9 


27.68 


20 


-1 


.83 


137 


10. 


78 


76 


150 


28.72 


44 


8.48 


137 


10.77 


79 


2. 


,49 


133 



Note. Highest ^-value was assigned Rank 1. 
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Table A-3 



Cpmparison of Four Item Statistics on 
Selected Items of the HM3 Exam 



Item No. 
(and Test 
Section No.) 



0) o 
»-c CO 



Black 



White 



-IS 



^it 



D ' 

— s 



r . 

— IS 



•D 
— s 



1 


w/ 


393 


3 


158 


8 


28.34 


It 


28.9 


388 


3 


104 


9 


33.38 




18.4 


(1) 


w/o 


139 


6 


118 


8 


- 






081 


8 


072 


9 








2 


w/ 


424 


2 


160 


7 


30.58 


2 


19.2 


379 


It 


111 


8 


30.73 




.4 


(1) 


w/o 


211 


5 


125 


7 








095 


7 


082 


8 








3 


w/ 


500 


1 


119 


10 


42.95 


1 


41.4 


299 


7 


032 


13 


24.96 


5 


45.0 • 


(1) 


w/o 


239 


2 


075 


10 








OU 


1 3 


000 


13 








20 


w/ 


310 


c 


289 


2 


15.71 


7 


49.0 


486 


1 


437 


1 


43.44 


1 


59.2 


(2) 


w/o 


216 


3.5 


247 


2 








427 


1 


411 


1 








30 


w/ 


307 


6 


212 


6 


14.86 


10 


63.5 


449 


2 


417 


2 


33.57 


2 


69.7 


(2) 


w/o 


216 


3.5 


170 


6 








391 


2 


392 


2 








60 


w/ 


294 


7 


073 


12 


19.55 


5 


56.7 


239 


10 


041 


12 


19.40 


9 


52.9 


(3) 


w/o 


084 


10 


029 


12 








040 


1 1 


008 


12 








70 


w/ 


183 


12 


043 


13 


3.33 


1 3 


26.9 


223 


12 


089 


1 1 


17.28 


1 1 


32.2 


(3) 


w/o 


009 


11 


003 


13 








037 


12 


059 


1 1 








80 


w/ 


257 


9 


314 


1 


9.91 


1 1 


35.6 


332 


5 


271 


3 


23.41 


7 


30.9 


(4) 


w/o 


116 


8 


275 


1 








233 


3 


243 


3 








90 


w/ 


120 


13 


136 


9 


15.58 


8 


34.6 


253 


9 


221 


4 


17.89 


10 


35.7 


(4) 


w/o 


-024 


13 


094 


9 








147 


5 


1.92 


4 








110 


w/ 


239 


10 


098 


11 


19.44 


6 


34.6 


229 


1 1 


126 


7 


16.70 


12 


29.3 


. (5) 


w/o 


101 


9 


056 


1 1 








120 


6 


097 


7 








130 


w/ 


376 


k 


275 


3 


28.98 


3 


33.7 


313 


6 


219 


5 


24.01 


6 


42.4 


(5) 


w/o 


248 


1 


235 


3 








198 




188 


5 








140 


w/ 


210 


11 


239 




8.69 


12 


24.0 


291 


8 


092 


10 


23.10 


8 


24.3 


(6) 


w/o 


-009 


12 


202' 










069 


9 


064 


10 








150 


w/ 


265 


8 


235 


\ 


15.07 


9 


9.6 


215 


13 


191 


6 


10.48 


1 3 


10.6 


(6) 


w/o 


118 


7 


209 










055 


10 


,172 


6 









Note. Decimal points of and jr^ point biserial correlations have been 
omitted. 

a ' ^ 

Total or section score calculated: 

w/ --with item in the score 
w/p--without item in the scor^ 



The rank (among the 13 items only) of each value is indicated by the smaller 
numbers, which are in superscript, highest value with rank l (e.g., for Item 20, 
White r. of .427, calculated without the item in section score, is rank 1. 

is CJ 
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Table A-4 



Distribution Statistics, and Intercorrelations Among 



Section and Total Scores of the HM3 ^^xam 



Black 

Section 1 2 3 4 5 6 Total 



1 308 001 122 183 -Oil 383 

2 283 364 431 114 ' 814 

3 163 152 158 456 

4 .419 248 684 

5 101 701 

6 363 

Mean 4.61 21.3^ 9.98 15.29 10.64 5.71 68.00 

S.D. 1.73 4.54 2.33 3.32 3.41 1.96 11.17 



White 

Section 1 2 3 4 5 6 Total 



1 273 158 192 219 109 370 

2 433 629 648 ' 255 904 

3 35^ 352 142) 573 

4 542 215 797^ 

5 233 80dt 

6 388 

Mean 5.15 21.22 10.58 16.63 11.63 6.19 73.45 

S.D 1.60 5.29 2.50 4.37 4.07 1.92 15.53 



Note. Decimal points for correlation^ have been 
omitted. 
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Table A-5 

Distribution Statistics and Intercorrelations Among 
Section and Total Scores of the &M£ Exam 



Black 



Section 


1 


2 


3 


4 


5. 


6 


7 


8 


Total 


1 




236 


191 


363 


349 


200 


057 


■ 374 


550 


2 






432 


241 


295 


190 


357 


404 


724 


3 








253 


294 


438 


320 


159 


721 


4 










' 322 


234 


259 


284 


582 


5 












125 


055 


274 


544 


6 














305 


191 


543 


7 
















017 


50i 


8 


















527 


Mean 


7.47 


13.58 


12.35 


5.39 


' 5.64 


5.32 


- 5^39 


4.97 


60.12 


S. D. 


2.31 


3.45 


3.59 


2.09 


1.96 


ll95 


2.09 


1.79 


11.70 



Section 




1 


2 


3 


White 
4 5 


6 


7 


8 


Total 
























1 






257 


239 


204 


174 


157 


242 


217 


564 


2 








354 


224 


150 


221 


289 


144 


650 


3 




i 






240 


239 


201 


273 


255 


694 


4 




\ 








152 


082 


236 


163 


515 


5 














093 


146 


201 


452 


6 
















172 


137 


421 


7 


















256 


580 


8 




















500 


Mean 


8. 


26 


13.71 


12.63^ 


5.78 


5.78 


5.83 


6.07 


5.31 


63.43 


S.D. 


2. 


39 


3.02 


3.25 


2.23 


1.93 


1.75 


2.18 


1.95 


10.56 



— ^ i 

Note. Decimal points for correlations have been omitted. 
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It might be reasonable to assume that, if the section-total correla- 
tion is low, _r^^ would be higher than and more appropriate than r^^ (if 

the section content is assumed to be homogenous). However, these assump- 
tions are not supported by the few illustrative items of the HM3 Exam in 
Table A-3. For example, for Blacks, r^^ is higher than on the two items 

(140 and 150) from section 6, although this section had the lowest section- 
total correlation (.363 in Table A-4). Of the two items (20 and 30 in 
Table A-3) from the section with the highest section-total correlation 
(.814 in Table A-4) , one r^^ is higher, and the other is lower than I.^^- 

In the light of varying differences between r^^ and L^^y and among 

section-total correlation (including, quite likely, even sections of hetero- 
geneous content), generally, the most useful measure of item differentiation 
appears to be r^^^ (without the item included in total score). (Nonethe- 
less, use of _Dg with item in section score is considered useful and 

adequate for analyzing the relative differences between racial groups 
in the present study.) 

Relationship Between P and D Values 

When the corresponding|^""values for the highest values were examined 
(see Table 4 and page 7), the median value of the highest _D values was 
generally higher than the total median value. Similar results were also 
obtained with the corresponding P^ values for the highest _r^^ values in 

Table A--6. With one exception (the HM3 Black group), these corresponding 
P^ values are higher than the total test median P^ value. For example, the 
corresponding median P^ value, 54.19, for the highest r^^^ values of the 

ADJ3 White group, is substantially higher than the total test median P^ 
values, 45.81 for that group. 

Table A-3 also provides examples of high P^ values which yield high or 
low differentiation values (e.g., for the White group. Item 20 P^ value of 
59.2 with r^^^ without the item in score of .411, but Item 60 P. value of 

52.9 with r_^^ of only .008), and low P^ values which yield high or low 

differentiation values (e.g.. Item 80 £ value of 30.9 with r^^^ of . 243, 

but Item 70 P. value of 32.2 with r^^^ of only .059. 

Reversing the orientation and comparing P^ values with corresponding 
_D values yielded similar results ( see Table A—7) . The P v^l^^s of middle 
difficulty (e.g., ADJ3 Black group, median P^ value of 34.04) yield corre- 
S|)ondlng D values (e.g., 41.12) which are substantially lower than the , 
highest J3 values (e.g., ADJ3 Black median, 54.41, in Table 4, page 10). 
Figures A-1, A-2, and A-3 display the median P^ values^-and corresponding 
D values for the 7-item ranked sets of items in Table A-7. It may be 
observed that the highest P^ values yield corresponding values which are 
higher than the corresponding D values of the lowest P^ values for both 
Blacks and Whites. 



Table A-6 

Range and Median of Nine- Item Sets of Highest and Lowest r.^ Values 
And Corresponding P Values for Four Rate Groups . 



Rate 



•0 

C P 

0) -H 

I /-» 
'•J 

£ H) 

■H 0 

I J 



Black 



White 



Ranked r. 
-It 



Corresponding P 



• Ranked r.^ 
-It 



Corresponding P 



Range Median Range Median 



Range Median Rai^ge Median 



ADJ3 


H 


.440 


- .574 


.474 


21.28 


- 51.06 


38.30 


.376 - 


.501 


.384 


29.04 . 


■ 81,83 


54,19 




L 


-.185 


- -.108 


-.166 


17.02 


- 57.45 


36.17 


-.034 - 


.021 


-.019 


10.09 . 


■ 53.11 


38.51 




H 


.34) 


- ,511 


.371 


24.04 


- 69.23 


.12 


.436 - 


.481 


.446 


41.15 • 


■ 72.92 


58.43 




L' 


-.253 


• -.032 


-.117 


7.69 


- 59.62 


29.53 


-.092 - 


.011 


-.015 


7.42 ■ 


■ 77.75 


32.26 




H 


.418 


- .583 


.458 


22.41 


- 51.72 


36.21 


.438 - 


.493 


.467 


44.00 ■ 


• 66.16 


50.28 




L 


-.199 


- -.071 


-.147 


15.52 


- 70.69 


36,21 


-.083 - 


.041 


.00) 


9.13- 


■ 49.09 


30.18 


m 


H 


.364 


- .559 


.398 


17.57 


- 70.27 


47.30 


.266 - 


.370 


.287 


35.68 


■ 62.74 


49.03 




L 


-.150 


■ -.027 


-.079 


9.46 


- 68.92 ^ 


20.72 


-.088 - 


-.021 


-.053 


5.10'- 


• 35.33 


11.95 



4£ 
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J 



Table A-7 

Range and Median of Seven-Item Ranked Sets of Highest, Middle 
And Lowest P Values and Their Corresponding D Values 



Black 



White 



Rate 



HM3 



MM3 



Highest 



Middle 



Lowest 



Highest 



Middle 



Lowest 



Group Range Median Range 



Range Median Range Median Range Median \ Range Median 



5r.45- „ 54.04- 
,ADJ3 78.72 ^^'^^ 54.04 



34.04 



8.51- ^2.77 
14.89 



75.31' 
83.23 



79.81 



45.34. 
47.36 



46.74 



20.73- 
45.82 



27.36 



-1.15- 
44.02 



41.12 



-7.51- 
9.88 



4.78 



20.70- 
"25.09 



22.74 



14.53- 
45.53 



26.24 



76.92- 
95.19 

1.90- 
26.58 

70.69- 
82.76 

5.71- 
29.52 



81.73 



15.30 



74.14 



18.99 



44.23- 
46.15 

14.88- 
46^40 

39.66- 
39.66 

13.10- 
28.73 



45.19 



20.56 



59.66 



14.52 



5.77- 

-2.42- 
15.02 

6.90- 
15.52 

-.24- 

20.97 



7.69 



9.29 



15.52 



6.19 



80.97- 
96.36 

4.93- 
21.67 

72,99- 
88.64 

4.44- 
27.06 



86.00 



12.15 



79.03 



19.30 



48.92- 
50.80 

17.58- 
36.88 

47.74- 
49,17 

20.80- 
39.93 



'49.69 
30.41 

48.77 
~f 

28.10 



10.-09. 
18.32 



5.46- 
16.45 



7.42^ 

\m 

5.90- 
12.88 

9.13- 
15.89 

.35 
14.88 



14.44 



13.18 



8.75 



9.26 



13.34 



10.12 



50 
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SEVEN-ITEM SETS EXTRACTED FROM RANKEOi VALUES 



uir A- 1, Median values and corresponding D values by race 
(AI)J3 Kxam). 
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SEVEN-ITEM SETS EXTRACTED FROM RANKED iL VALUES 



iKuro Modian values and corresponding ^ values by race 

(I1M3 Hxam) . 
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igure A-3. Median values and corresponding _D values by race 
(MM3 Exam). 
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FlndlnRs 

In the methodological comparisons of alternative measures of item 
difficulty, the Item-score correlations, with the item iaclud6d in the 
score were greater than without the item included. With the item in 
the score, the item -total correlation d^^) was greater by about ,02 

to .OA correlation points, and the item -section correlation (r. ) was 

Iji — ^—18 

greater by large and varying amounts. 

In compkrisons of item-section and item-tot;al measures, the percentage 
difference between high and low scorers answering the item correctly was 
higher on the item -section percentages (D^ values) than on the item -total 

percentages (D^ values) with the item inducted in both scores. 

Item-section (l.^^) and item-total (JL^^) correlations, without the 

item included in the score of either, varied as to which was the larger. 

Section score i^tercorrelatlons within each total test varied from low to 

high values, suggesting sotne heterogeneity in some tests or some sections 

of ,tests, (Heterogeneity would tdhd to reduce r. or r,^ values,) In 

— is "T-t 

light of these varying differences, the most useful measure of item differ- 
entiation appears to be jr^^ without the item included in the total score. 

The values which corresponded to the highest D valuea or r^^ values 

were higher than the median values for the total tests, suggesting that 
easier items might improve item differentiation. In the comparison of the 
ends of the JP value ranges, the highest JP values (i,e,, easiest items) had 
corresppnding values which were higher than the corresponding £ values 
pf the lowest values, which suggests that the diff icu^t items are 
excessively difficult. 
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